Improving Bilayer Product Quantization for Billion-Scale Approximate Nearest Neighbors in High Dimensions

نویسندگان

Artem Babenko

Victor S. Lempitsky

چکیده

The top-performing systems for billion-scale high-dimensional approximate nearest neighbor (ANN) search are all based on two-layer architectures that include an indexing structure and a compressed datapoints layer. An indexing structure is crucial as it allows to avoid exhaustive search, while the lossy data compression is needed to fit the dataset into RAM. Several of the most successful systems use product quantization (PQ) [4] for both the indexing and the dataset compression layers. These systems are however limited in the way they exploit the interaction of product quantization processes that happen at different stages of these systems. Here we introduce and evaluate two approximate nearest neighbor search systems that both exploit the synergy of product quantization processes in a more efficient way. The first system, called Fast Bilayer Product Quantization (FBPQ), speeds up the runtime of the baseline system (MultiD-ADC) by several times, while achieving the same accuracy. The second system, Hierarchical Bilayer Product Quantization (HBPQ) provides a significantly better recall for the same runtime at a cost of small memory footprint increase. For the BIGANN dataset of billion SIFT descriptors, the 10% increase in Recall@1 and the 17% increase in Recall@10 is observed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Billion-scale similarity search with GPUs

Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at data-parallel tasks, prior approaches are bottlenecked by algorithms that expose less p...

متن کامل

High-dimensional approximate nearest neighbor: k-d Generalized Randomized Forests

We propose a new data-structure, the generalized randomized k -d forest, or k -d GeRaF, for approximate nearest neighbor searching in high dimensions. In particular, we introduce new randomization techniques to specify a set of independently constructed trees where search is performed simultaneously, hence increasing accuracy. We omit backtracking, and we optimize distance computations, thus ac...

متن کامل

Polysemous Codes

This paper considers the problem of approximate nearest neighbor search in the compressed domain. We introduce polysemous codes, which offer both the distance estimation quality of product quantization and the efficient comparison of binary codes with Hamming distance. Their design is inspired by algorithms introduced in the 90’s to construct channel-optimized vector quantizers. At search time,...

متن کامل

Optimized Codebook Construction and Assignment for Product Quantization-based Approximate Nearest Neighbor Search

Nearest neighbor search (NNS) among large-scale and high-dimensional vectors has played an important role in recent large-scale multimedia search applications. This paper proposes an optimized codebook construction algorithm for approximate NNS based on product quantization. The proposed algorithm iteratively optimizes both codebooks for product quantization and an assignment table that indicat...

متن کامل

Parallel Algorithms for Nearest Neighbor Search Problems in High Dimensions

The nearest neighbor search problem in general dimensions finds application in computational geometry, computational statistics, pattern recognition, and machine learning. Although there is a significant body of work on theory and algorithms, surprisingly little work has been done on algorithms for high-end computing platforms and no open source library exists that can scale efficiently to thou...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1404.1831 شماره

صفحات -

تاریخ انتشار 2014

Improving Bilayer Product Quantization for Billion-Scale Approximate Nearest Neighbors in High Dimensions

نویسندگان

چکیده

منابع مشابه

Billion-scale similarity search with GPUs

High-dimensional approximate nearest neighbor: k-d Generalized Randomized Forests

Polysemous Codes

Optimized Codebook Construction and Assignment for Product Quantization-based Approximate Nearest Neighbor Search

Parallel Algorithms for Nearest Neighbor Search Problems in High Dimensions

عنوان ژورنال:

اشتراک گذاری